Web Mining Accelerated with In-Memory and Column Store Technology

نویسندگان

  • Patrick Hennig
  • Philipp Berger
  • Christoph Meinel
چکیده

Current web mining approaches use massive amounts of commodity hardware and processing time to leverage analytics for today’s web. For a seamless application interaction, those approaches have to use pre-aggregated results and indexes to circumvent the slow processing on their data stores e.g. relational databases or document stores. The upcoming trend of in-memory, column-oriented databases is widely used to accelerate business analytics like financial reports, but the application on large text corpora remains unaffected. We argue that although in-memory, column-oriented stores are tailor-made for traditional data schemes, they are also applicable for web mining applications that mainly consists of raw text informations enriched with limited semantic meta data. Thus, we implement a web mining application that stores every information in a pure main memory data store. We experience an acceleration of current web mining queries and identify new opportunities for web mining applications. To evaluate the performance impact, we compare the run-time of general web mining tasks on a traditional row-oriented, disc-based database and a column-oriented, in-memory database using the example of BlogIntelligence, which serves exemplary for web mining applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Customer lifetime value model in an online toy store

Business all around the world uses different approaches to know their customers, segment them and formulate suitable strategies for them. One of these approaches is calculating the value of each customer for the company. In this paper by calculating Customer Lifetime Value (CLV) for individual customers of an online toy store named Alakdolak, three customer segments are extracted. The level of ...

متن کامل

Interactional effects of bubble size, particle size, and collector dosage on bubble loading in column flotation

The success of flotation operation depends upon the thriving interactions of chemical and physical variables. In this work, the effects of particle size, bubble size, and collector dosage on the bubble loading in a continuous flotation column were investigated. In other words, this work was mainly concerned with the evaluation of the true flotation response to the changes in the operating varia...

متن کامل

Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining

Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...

متن کامل

The Effect of the Slot Length on Beam Vertical Shear in I-Beams with Moment Connections

This paper evaluates the effect of slot existence with limited length between flanges and web junction of I-shaped beams at the region of moment connections on vertical force and shear stress distribution in beam flanges and web at connection section in comparison with classical theory of stress distribution. The main purpose of this research is to evaluate the efficiency of the slot in connect...

متن کامل

A Cost-Aware Strategy for Merging Differential Stores in Column-Oriented In-Memory DBMS

Fast execution of analytical and transactional queries in column-oriented in-memory DBMS is achieved by combining a readoptimized data store with a write-optimized differential store. To maintain high read performance, both structures must be merged from time to time. In this paper we describe a new merge algorithm that applies full and partial merge operations based on their costs and improvem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013